1. Load Packages

source("./Mean Reversion/RMR.001 Load Packages.R") 

2. Load Data

pricing_data <- read_csv("./Mean Reversion/Raw Data/pricing data.csv") 
## Parsed with column specification:
## cols(
##   date_unix = col_integer(),
##   date_time = col_datetime(format = ""),
##   high = col_double(),
##   low = col_double(),
##   open = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   quote_volume = col_double(),
##   weighted_average = col_double(),
##   currency_pair = col_character(),
##   period = col_integer()
## )

3. Prepare Data Function

Description
Spreads Poloneix pricing data into wide format and filters data to a specified time resolution and time window.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
start_date: The start date of the time window.
end_date: The end date of the time window.

prepare_data <- function(pricing_data, time_resolution, start_date, end_date) { 
  df <- pricing_data %>% 
    filter(period == time_resolution, 
           date_time >= start_date, 
           date_time <= end_date) %>% 
    select(date_unix, date_time, close, currency_pair) %>% 
    spread(currency_pair, close) 
  return(df)
} 

4. Test Cointegration Function

Description
The Engle-Granger method is used to test for cointegration. This method is comprised of two steps: (1) Perform a linear regression of log(coin_y) on log(coin_x). (2) Perform an Augmented Dickey-Fuller test on the residuals from the linear regression estimated in (1). The ADF test specification is of a non-zero mean, no time-based trend, and one autoregressive lag. The function returns the ADF test statistic.

Arguments
coin_y: A vector containing the pricing data for the dependent coin in the regression.
coin_x: A vector containing the pricing data for the independent coin in the regression.

test_cointegration <- function(coin_y, coin_x) { 
  lm_model <- lm(log(coin_y) ~ log(coin_x))  
  lm_residuals <- lm_model[["residuals"]] 
  adf_test <- ur.df(lm_residuals, type = "drift", lags = 1) 
  df_stat = adf_test@testreg[["coefficients"]][2, 3]
  return(df_stat) 
} 

5. Create Coin Pairs Function

Description
Two sets of currency pairs are examined: currency pairs where USDT is the quote currency and currency pairs where BTC is the quote currency. All combinations of coins within each set are created. Combinations that consist of the coin with itself are removed. The function returns a dataframe containing the coin pairs.

create_pairs <- function() { 
  coins_usdt <- c("USDT_BTC", "USDT_DASH", "USDT_ETH", "USDT_LTC", "USDT_REP", "USDT_XMR", "USDT_ZEC")
  coins_btc <- c("BTC_DASH", "BTC_ETH", "BTC_LTC", "BTC_REP", "BTC_XEM", "BTC_XMR", "BTC_ZEC")
  coin_pairs <- rbind(expand.grid(coins_usdt, coins_usdt), expand.grid(coins_btc, coins_btc)) %>% 
    rename(coin_y = Var1, 
           coin_x = Var2) %>% 
    filter(coin_y != coin_x) %>% 
    mutate_if(is.factor, as.character) %>%
    as_tibble() 
  return(coin_pairs)
} 

6. Test Coin Pairs Function

Description
Test for cointegration between each coin pair generated by the create_pairs() function. The test for cointegration is performed by the test_cointegration() function. The function returns a dataframe containing the coin pairs and the ADF test statistic resulting from testing cointegration between each coin pair.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().

test_pairs <- function(train, coin_pairs) { 
  adf_stat <- c() 
  for (n in 1:nrow(coin_pairs)) { 
    coin_y <- coin_pairs[[n, "coin_y"]] 
    coin_x <- coin_pairs[[n, "coin_x"]] 
    cointegration_results <- test_cointegration(coin_y = train[[coin_y]], coin_x = train[[coin_x]])
    adf_stat <- c(adf_stat, cointegration_results)
  } 
  df <- coin_pairs %>% 
    mutate(adf_stat = adf_stat) %>% 
    arrange(adf_stat)
  return(df) 
} 

7. Select Coin Pairs Function

Description
Select cointegrated coin pairs to be used in a mean reversion strategy. The current coin selection logic is to select all coins where the ADF test statistic is less than -2.57.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().

select_pairs <- function(train, coin_pairs) { 
  df <- test_pairs(train = train, coin_pairs = coin_pairs) %>% 
    filter(adf_stat <= -3.43) 
  return(df) 
} 

8. Generate Signals Function

Description
Generate trading signals that indicate the current position in the spread formed by a linear combination of coin y and coin x. A signal of +1 indicates a long position in the spread, 0 indicates a flat position, and -1 indicates a short position in the spread. Signals are generated for the test set using a model trained on the training set.

The current trading logic is perform a linear regression of log(coin y) on log(coin x) using the training set. A spread is then calculated in the test set using the fitted hedge ratio and intercept from the regression. The z-score of the spread is then calculated using the mean and standard deviation from the training set. A position is entered when the z-score reaches +2 or -2 and is exited when the z-score reaches 0. Also exits losing positions when the z-score reaches +4 or -4 and re-enters the position when when it returns to within the +4 or -4 range.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

generate_signals <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))   
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_signals <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           signal_long = ifelse(lag(spread_z, 1) <= -threshold_z, 1, NA), 
           signal_long = ifelse(lag(spread_z, 1) >= 0, 0, signal_long), 
           signal_long = ifelse(lag(spread_z, 1) <= -4, 0, signal_long), 
           signal_long = ifelse(lag(cummin(spread_z), 1) <= -4, 0, signal_long), 
           signal_long = na.locf(signal_long, na.rm = FALSE), 
           signal_short = ifelse(lag(spread_z, 1) >= threshold_z, -1, NA), 
           signal_short = ifelse(lag(spread_z, 1) <= 0, 0, signal_short), 
           signal_short = ifelse(lag(spread_z, 1) >= 4, 0, signal_short), 
           signal_short = ifelse(lag(cummax(spread_z), 1) >= 4, 0, signal_short), 
           signal_short = na.locf(signal_short, na.rm = FALSE), 
           signal = signal_long + signal_short, 
           signal = ifelse(is.na(signal), 0, signal)) 
  return(df_signals[["signal"]])
} 

9. Backtest Pair Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using coin y and coin x.

The current backtesting logic uses signals generated by generate_signals(). The coin_y_return and coin_x_return indicate the one period percentage return of each coin. The coin_y_position and coin_x_position indicate the market value in USD in each coin. coin_y_pnl and coin_x_pnl indicate the USD value of the profit and loss for each coin. The combined_position indicates the gross market value of the combined positions.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

backtest_pair <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))  
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_backtest <- test %>% 
    mutate(signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           coin_y_return = test[[coin_y]] / lag(test[[coin_y]], 1) - 1, 
           coin_x_return = test[[coin_x]] / lag(test[[coin_x]], 1) - 1, 
           coin_y_position = signal * 1, 
           coin_x_position = signal * hedge_ratio * -1,  
           coin_y_pnl = lag(coin_y_position, 1) * coin_y_return, 
           coin_x_pnl = lag(coin_x_position, 1) * coin_x_return, 
           combined_position = abs(coin_y_position) + abs(coin_x_position), 
           combined_pnl = coin_y_pnl + coin_x_pnl, 
           combined_return = combined_pnl / lag(combined_position, 1)) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .))) %>%
    mutate(return_pair = cumprod(1 + combined_return)) 
  return(df_backtest[["return_pair"]])
} 

10. Backtest Strategy Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using an equally weighted portfolio of cointegrated coin pairs.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
selected_pairs: A dataframe generated by select_coins() that represents a set of cointegrated coin pairs.

backtest_strategy <- function(train, test, selected_pairs, threshold_z) { 
  df <- tibble()  
  for (i in 1:nrow(selected_pairs)) { 
    single_pair <- tibble(
      return_pair = backtest_pair(train = train, 
                                  test = test, 
                                  coin_y = selected_pairs[["coin_y"]][i], 
                                  coin_x = selected_pairs[["coin_x"]][i], 
                                  threshold_z = threshold_z), 
      coin_y = selected_pairs[["coin_y"]][i], 
      coin_x = selected_pairs[["coin_x"]][i], 
      date_time = test[["date_time"]]
    )
    df <- bind_rows(df, single_pair)
  }
  df <- df %>% 
    group_by(date_time) %>% 
    summarise(return_strategy = mean(return_pair)) 
  return(df[["return_strategy"]])
} 

11. Plot Single Function

Description
Create plots of a cointegration-based mean reversion trading strategy of a single coin pair conprised of coin y and coin x. There are two plots created by this function. The first plot displays the spread transformed into z-score with three red lines at -2, 0, and 2. A green line indicates the signal which can take values -1, 0, and +1. The second plot displays the cumulative return of the model in blue. Two additional lines show the buy and hold return of coin y and coin x as red and green lines, respectively.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_single <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))  
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_plot <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           return_pair = backtest_pair(train = train, 
                                       test = test, 
                                       coin_y = coin_y, 
                                       coin_x = coin_x, 
                                       threshold_z = threshold_z), 
           return_buyhold_y = test[[coin_y]] / test[[coin_y]][1], 
           return_buyhold_x = test[[coin_x]] / test[[coin_x]][1]) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = spread_z, colour = "Spread Z"), size = 1) + 
          geom_line(aes(y = signal, colour = "Signal"), size = 0.5) + 
          geom_hline(yintercept = 0, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = 2, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = -2, colour = "red", alpha = 0.5) + 
          scale_color_manual(name = "Series", 
                             values = c("Spread Z" = "blue", 
                                        "Signal" = "green")) + 
          labs(title = "Spread vs Trading Signal", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Spread and Signal")) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = return_pair, colour = "Model"), size = 1) + 
          geom_line(aes(y = return_buyhold_y, colour = "Coin Y"), size = 0.5, alpha = 0.4) + 
          geom_line(aes(y = return_buyhold_x, colour = "Coin X"), size = 0.5, alpha = 0.4) + 
          geom_hline(yintercept = 1, colour = "black") + 
          scale_color_manual(name = "Return", 
                             values = c("Model" = "darkblue", 
                                        "Coin Y" = "darkred", 
                                        "Coin X" = "darkgreen")) + 
          labs(title = "Model Return vs Buy Hold Return", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Cumulative Return"))
} 

12. Plot Many Function

Description
Create many plots by calling the plot_single() function multiple times. Also creates a plot showing the results of the overall strategy. Creates a train and test set surrounding a cutoff date and creates plot for the top 10 selected coins ranked by their ADF statistic.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
cutoff_date: A data representing the cutoff date between the train and test sets.
train_window: A period object from the lubridate package representing the length of time the train set covers.
test_window: A period object from lubridate package representing the length of time the the test set covers. threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_many <- function(pricing_data, time_resolution, cutoff_date, train_window, test_window, threshold_z) { 
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = as.Date(cutoff_date) - train_window, 
                        end_date = as.Date(cutoff_date)) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = as.Date(cutoff_date), 
                       end_date = as.Date(cutoff_date) + test_window) 
  selected_pairs <- select_pairs(train = train, coin_pairs = create_pairs()) 
  test <- test %>% 
    mutate(return_strategy = backtest_strategy(train = train, 
                                               test = ., 
                                               selected_pairs = selected_pairs, 
                                               threshold_z = threshold_z)) 
  print(selected_pairs) 
  for (i in 1:10) { 
    plot_single(train = train, 
                test = test, 
                coin_y = selected_pairs[["coin_y"]][i], 
                coin_x = selected_pairs[["coin_x"]][i], 
                threshold_z = threshold_z)
  } 
  ggplot(test, aes(x = date_time)) + 
    geom_line(aes(y = return_strategy, colour = "Strategy"), size = 1) + 
    geom_line(aes(y = USDT_BTC / USDT_BTC[1], colour = "USDT_BTC"), size = 0.5, alpha = 0.4) + 
    geom_hline(yintercept = 1, colour = "black") + 
    scale_color_manual(name = "Return", 
                       values = c("Strategy" = "darkblue", 
                                  "USDT_BTC" = "darkred")) + 
    labs(title = "Strategy Return vs Buy Hold Return", 
         x = "Date", 
         y = "Cumulative Return") 
} 

13. Set Parameters

time_resolution <- 300 
train_window <- days(8) 
test_window <- days(4) 
test_by <- "4 days"
threshold_z <- 2 

14. Cross Validation September 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-09-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 13 x 3
##       coin_y    coin_x  adf_stat
##        <chr>     <chr>     <dbl>
##  1 USDT_DASH  USDT_ZEC -6.304579
##  2  USDT_ZEC USDT_DASH -6.264606
##  3  BTC_DASH   BTC_ZEC -5.749083
##  4   BTC_ZEC  BTC_DASH -5.722446
##  5  USDT_ZEC  USDT_XMR -4.395168
##  6  USDT_XMR  USDT_ZEC -4.308629
##  7 USDT_DASH  USDT_XMR -4.087038
##  8  BTC_DASH   BTC_XMR -3.957694
##  9  USDT_XMR USDT_DASH -3.931789
## 10   BTC_ZEC   BTC_XMR -3.842900
## 11   BTC_XMR  BTC_DASH -3.760414
## 12   BTC_XMR   BTC_ZEC -3.680398
## 13   BTC_XEM   BTC_LTC -3.515241

15. Cross Validation August 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-08-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 42 x 3
##      coin_y   coin_x  adf_stat
##       <chr>    <chr>     <dbl>
##  1 USDT_ZEC USDT_REP -8.042947
##  2 USDT_REP USDT_ZEC -8.030561
##  3  BTC_REP  BTC_ZEC -6.581912
##  4  BTC_ZEC  BTC_REP -6.552723
##  5 USDT_LTC USDT_ETH -5.783812
##  6 USDT_ETH USDT_LTC -5.726767
##  7 USDT_ETH USDT_REP -5.186577
##  8 USDT_REP USDT_ETH -5.110959
##  9  BTC_LTC  BTC_ETH -5.031428
## 10 USDT_ETH USDT_ZEC -5.005152
## # ... with 32 more rows

16. Cross Validation July 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-07-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 54 x 3
##      coin_y   coin_x  adf_stat
##       <chr>    <chr>     <dbl>
##  1 USDT_LTC USDT_XMR -7.365617
##  2 USDT_XMR USDT_LTC -7.317278
##  3 USDT_BTC USDT_XMR -7.017733
##  4 USDT_BTC USDT_LTC -6.972132
##  5 USDT_LTC USDT_BTC -6.880453
##  6 USDT_XMR USDT_BTC -6.877235
##  7 USDT_REP USDT_LTC -6.462307
##  8 USDT_LTC USDT_REP -6.448773
##  9 USDT_BTC USDT_REP -6.017692
## 10 USDT_REP USDT_BTC -5.921481
## # ... with 44 more rows

17. Cross Validation June 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-06-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 46 x 3
##       coin_y    coin_x  adf_stat
##        <chr>     <chr>     <dbl>
##  1  USDT_REP  USDT_LTC -7.355843
##  2  USDT_LTC  USDT_REP -7.174622
##  3   BTC_XEM   BTC_REP -6.056084
##  4   BTC_XEM   BTC_XMR -5.815794
##  5   BTC_XEM   BTC_LTC -5.585049
##  6  USDT_XMR USDT_DASH -5.546174
##  7 USDT_DASH  USDT_XMR -5.377612
##  8  USDT_REP  USDT_XMR -5.352318
##  9  USDT_XMR  USDT_REP -5.266058
## 10  USDT_XMR  USDT_ZEC -5.233248
## # ... with 36 more rows

18. Cross Validation May 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-05-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 13 x 3
##       coin_y    coin_x  adf_stat
##        <chr>     <chr>     <dbl>
##  1  USDT_XMR USDT_DASH -4.030973
##  2 USDT_DASH  USDT_XMR -4.016446
##  3 USDT_DASH  USDT_ZEC -3.700797
##  4  USDT_ZEC USDT_DASH -3.685299
##  5  USDT_LTC USDT_DASH -3.677481
##  6  USDT_XMR  USDT_ZEC -3.649193
##  7  USDT_LTC  USDT_REP -3.631887
##  8  USDT_ZEC  USDT_XMR -3.618951
##  9  USDT_LTC  USDT_XMR -3.599347
## 10  USDT_LTC  USDT_ZEC -3.589323
## 11   BTC_LTC   BTC_XEM -3.545460
## 12  USDT_LTC  USDT_ETH -3.478439
## 13  USDT_LTC  USDT_BTC -3.464340

19. Cross Validation April 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-04-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 11 x 3
##      coin_y   coin_x  adf_stat
##       <chr>    <chr>     <dbl>
##  1 USDT_ZEC USDT_XMR -5.221770
##  2 USDT_XMR USDT_ZEC -5.220593
##  3  BTC_ZEC  BTC_XMR -4.857048
##  4  BTC_XMR  BTC_ZEC -4.828443
##  5 USDT_REP USDT_LTC -3.712762
##  6  BTC_ETH BTC_DASH -3.593486
##  7  BTC_ETH  BTC_XMR -3.586068
##  8 USDT_ETH USDT_BTC -3.480062
##  9 USDT_ETH USDT_REP -3.473069
## 10  BTC_ETH  BTC_ZEC -3.440235
## 11 USDT_ETH USDT_ZEC -3.437994

20. Cross Validation March 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-03-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 28 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
##  1 USDT_LTC  USDT_ZEC -6.483461
##  2 USDT_LTC  USDT_ETH -6.183461
##  3 USDT_LTC  USDT_XMR -5.549095
##  4 USDT_LTC  USDT_REP -5.464132
##  5 USDT_LTC USDT_DASH -5.429595
##  6 USDT_LTC  USDT_BTC -5.269985
##  7  BTC_LTC  BTC_DASH -4.760918
##  8  BTC_REP   BTC_LTC -4.498215
##  9  BTC_XEM   BTC_XMR -4.497070
## 10 BTC_DASH   BTC_LTC -4.352129
## # ... with 18 more rows

21. Cross Validation February 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-02-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 43 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
##  1 USDT_ETH  USDT_BTC -6.871740
##  2 USDT_ETH  USDT_LTC -6.758688
##  3 USDT_ETH  USDT_ZEC -6.733779
##  4 USDT_ETH  USDT_REP -6.430198
##  5 USDT_ETH  USDT_XMR -6.295559
##  6 USDT_ETH USDT_DASH -6.212736
##  7  BTC_REP   BTC_ETH -6.157639
##  8 USDT_REP  USDT_ZEC -5.771472
##  9  BTC_ETH   BTC_REP -5.594829
## 10 USDT_REP  USDT_XMR -5.547109
## # ... with 33 more rows

22. Cross Validation January 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-01-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 16 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
##  1  BTC_REP   BTC_LTC -5.170427
##  2  BTC_LTC   BTC_REP -5.069562
##  3 USDT_LTC  USDT_ZEC -4.513901
##  4 USDT_BTC  USDT_XMR -4.413103
##  5 USDT_XMR  USDT_BTC -4.411198
##  6 USDT_ZEC  USDT_ETH -4.053381
##  7 USDT_ZEC  USDT_LTC -4.044288
##  8 USDT_LTC  USDT_BTC -3.981324
##  9 USDT_LTC  USDT_XMR -3.868763
## 10  BTC_XEM   BTC_XMR -3.784262
## 11 USDT_LTC  USDT_ETH -3.592777
## 12  BTC_XEM  BTC_DASH -3.584556
## 13 USDT_ZEC USDT_DASH -3.546664
## 14  BTC_LTC   BTC_XMR -3.543821
## 15 USDT_ZEC  USDT_XMR -3.439142
## 16 USDT_LTC  USDT_REP -3.430087

23. Cross Validation Full

cutoff_dates <- seq(ymd("2017-01-01"), ymd("2017-10-01"), by = test_by)
results <- tibble() 
for (cutoff_date in cutoff_dates) { 
  cutoff_date <- as.Date(cutoff_date) 
  print(str_c("Cross validating strategy."))
  print(str_c("Using train set from ", cutoff_date - train_window , " to ", cutoff_date, ".")) 
  print(str_c("Using test set from ", cutoff_date, " to ", cutoff_date + test_window, "."))  
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = cutoff_date - train_window, 
                        end_date = cutoff_date) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = cutoff_date, 
                       end_date = cutoff_date + test_window) 
  test <- test %>% 
    mutate(return_strategy = backtest_strategy(train = train, 
                                               test = test, 
                                               selected_pairs = select_pairs(train = train, coin_pairs = create_pairs()), 
                                               threshold_z = threshold_z), 
           return_strategy_change = return_strategy / lag(return_strategy, 1) - 1) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .)))
  results <- bind_rows(results, test) 
} 
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-24 to 2017-01-01."
## [1] "Using test set from 2017-01-01 to 2017-01-05."
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-28 to 2017-01-05."
## [1] "Using test set from 2017-01-05 to 2017-01-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-01 to 2017-01-09."
## [1] "Using test set from 2017-01-09 to 2017-01-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-05 to 2017-01-13."
## [1] "Using test set from 2017-01-13 to 2017-01-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-09 to 2017-01-17."
## [1] "Using test set from 2017-01-17 to 2017-01-21."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-13 to 2017-01-21."
## [1] "Using test set from 2017-01-21 to 2017-01-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-17 to 2017-01-25."
## [1] "Using test set from 2017-01-25 to 2017-01-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-21 to 2017-01-29."
## [1] "Using test set from 2017-01-29 to 2017-02-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-25 to 2017-02-02."
## [1] "Using test set from 2017-02-02 to 2017-02-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-29 to 2017-02-06."
## [1] "Using test set from 2017-02-06 to 2017-02-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-02 to 2017-02-10."
## [1] "Using test set from 2017-02-10 to 2017-02-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-06 to 2017-02-14."
## [1] "Using test set from 2017-02-14 to 2017-02-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-10 to 2017-02-18."
## [1] "Using test set from 2017-02-18 to 2017-02-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-14 to 2017-02-22."
## [1] "Using test set from 2017-02-22 to 2017-02-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-18 to 2017-02-26."
## [1] "Using test set from 2017-02-26 to 2017-03-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-22 to 2017-03-02."
## [1] "Using test set from 2017-03-02 to 2017-03-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-26 to 2017-03-06."
## [1] "Using test set from 2017-03-06 to 2017-03-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-02 to 2017-03-10."
## [1] "Using test set from 2017-03-10 to 2017-03-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-06 to 2017-03-14."
## [1] "Using test set from 2017-03-14 to 2017-03-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-10 to 2017-03-18."
## [1] "Using test set from 2017-03-18 to 2017-03-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-14 to 2017-03-22."
## [1] "Using test set from 2017-03-22 to 2017-03-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-18 to 2017-03-26."
## [1] "Using test set from 2017-03-26 to 2017-03-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-22 to 2017-03-30."
## [1] "Using test set from 2017-03-30 to 2017-04-03."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-26 to 2017-04-03."
## [1] "Using test set from 2017-04-03 to 2017-04-07."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-30 to 2017-04-07."
## [1] "Using test set from 2017-04-07 to 2017-04-11."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-03 to 2017-04-11."
## [1] "Using test set from 2017-04-11 to 2017-04-15."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-07 to 2017-04-15."
## [1] "Using test set from 2017-04-15 to 2017-04-19."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-11 to 2017-04-19."
## [1] "Using test set from 2017-04-19 to 2017-04-23."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-15 to 2017-04-23."
## [1] "Using test set from 2017-04-23 to 2017-04-27."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-19 to 2017-04-27."
## [1] "Using test set from 2017-04-27 to 2017-05-01."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-23 to 2017-05-01."
## [1] "Using test set from 2017-05-01 to 2017-05-05."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-27 to 2017-05-05."
## [1] "Using test set from 2017-05-05 to 2017-05-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-01 to 2017-05-09."
## [1] "Using test set from 2017-05-09 to 2017-05-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-05 to 2017-05-13."
## [1] "Using test set from 2017-05-13 to 2017-05-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-09 to 2017-05-17."
## [1] "Using test set from 2017-05-17 to 2017-05-21."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-13 to 2017-05-21."
## [1] "Using test set from 2017-05-21 to 2017-05-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-17 to 2017-05-25."
## [1] "Using test set from 2017-05-25 to 2017-05-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-21 to 2017-05-29."
## [1] "Using test set from 2017-05-29 to 2017-06-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-25 to 2017-06-02."
## [1] "Using test set from 2017-06-02 to 2017-06-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-29 to 2017-06-06."
## [1] "Using test set from 2017-06-06 to 2017-06-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-02 to 2017-06-10."
## [1] "Using test set from 2017-06-10 to 2017-06-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-06 to 2017-06-14."
## [1] "Using test set from 2017-06-14 to 2017-06-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-10 to 2017-06-18."
## [1] "Using test set from 2017-06-18 to 2017-06-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-14 to 2017-06-22."
## [1] "Using test set from 2017-06-22 to 2017-06-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-18 to 2017-06-26."
## [1] "Using test set from 2017-06-26 to 2017-06-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-22 to 2017-06-30."
## [1] "Using test set from 2017-06-30 to 2017-07-04."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-26 to 2017-07-04."
## [1] "Using test set from 2017-07-04 to 2017-07-08."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-30 to 2017-07-08."
## [1] "Using test set from 2017-07-08 to 2017-07-12."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-04 to 2017-07-12."
## [1] "Using test set from 2017-07-12 to 2017-07-16."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-08 to 2017-07-16."
## [1] "Using test set from 2017-07-16 to 2017-07-20."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-12 to 2017-07-20."
## [1] "Using test set from 2017-07-20 to 2017-07-24."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-16 to 2017-07-24."
## [1] "Using test set from 2017-07-24 to 2017-07-28."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-20 to 2017-07-28."
## [1] "Using test set from 2017-07-28 to 2017-08-01."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-24 to 2017-08-01."
## [1] "Using test set from 2017-08-01 to 2017-08-05."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-28 to 2017-08-05."
## [1] "Using test set from 2017-08-05 to 2017-08-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-01 to 2017-08-09."
## [1] "Using test set from 2017-08-09 to 2017-08-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-05 to 2017-08-13."
## [1] "Using test set from 2017-08-13 to 2017-08-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-09 to 2017-08-17."
## [1] "Using test set from 2017-08-17 to 2017-08-21."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-13 to 2017-08-21."
## [1] "Using test set from 2017-08-21 to 2017-08-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-17 to 2017-08-25."
## [1] "Using test set from 2017-08-25 to 2017-08-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-21 to 2017-08-29."
## [1] "Using test set from 2017-08-29 to 2017-09-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-25 to 2017-09-02."
## [1] "Using test set from 2017-09-02 to 2017-09-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-29 to 2017-09-06."
## [1] "Using test set from 2017-09-06 to 2017-09-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-02 to 2017-09-10."
## [1] "Using test set from 2017-09-10 to 2017-09-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-06 to 2017-09-14."
## [1] "Using test set from 2017-09-14 to 2017-09-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-10 to 2017-09-18."
## [1] "Using test set from 2017-09-18 to 2017-09-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-14 to 2017-09-22."
## [1] "Using test set from 2017-09-22 to 2017-09-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-18 to 2017-09-26."
## [1] "Using test set from 2017-09-26 to 2017-09-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-09-22 to 2017-09-30."
## [1] "Using test set from 2017-09-30 to 2017-10-04."
results <- results %>% 
  mutate(return_strategy_cumulative = cumprod(1 + return_strategy_change), 
         date_time = as.POSIXct(date_time, origin = "1970-01-01")) 
ggplot(results, aes(x = date_time)) + 
  geom_line(aes(y = return_strategy_cumulative), colour = "blue", size = 1) + 
  geom_hline(yintercept = 1, colour = "black") + 
  labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return") 

print(results[["return_strategy_cumulative"]][nrow(results)]) 
## [1] 0.870272